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ABSTRACT 

As a response to the trends of the increasing importance 
of computational approaches and the accelerating pace in 
science, I propose in this position paper to establish the 
concept of “science bots” that autonomously perform pro¬ 
grammed tasks on input data they encounter and immedi¬ 
ately publish the results. We can let such bots participate 
in a reputation system together with human users, meaning 
that bots and humans get positive or negative feedback by 
other participants. Positive reputation given to these bots 
would also shine on their owners, motivating them to con¬ 
tribute to this system, while negative reputation will allow 
us to filter out low-quality data, which is inevitable in an 
open and decentralized system. 
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1. INTRODUCTION 

As datasets become increasingly important in all branches 
of science, many have proposed methods and tools to pub¬ 
lish data [31I10 . Nanopublications are an approach to 
bundle atomic data snippets (in RDF) in small packages to¬ 
gether with their provenance and metadata. Such nanopub¬ 
lications can be manually created by scientists and linked to 
their articles, but they can also be automatically extracted 
from existing datasets or be directly created by programs 
that implement scientific methods. 

In general, computer programs form a third kind of sci¬ 
entific contribution, besides narrative articles and datasets. 
While many such programs are openly available, there are 
no conventions or standards of how to reliably link data to 
the software that produced it, including the version of the 
software and the input it received. Moreover, due to the fo¬ 
cus of the scientific life cycle on the publication of articles, 
scientific software is typically applied only to the data avail¬ 
able at the time of writing a paper. It is often not the case 
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that new output data is made public when new input data 
becomes available. To tackle these problems, I argue that 
we can encapsulate certain types of scientific algorithms as 
small independent agents that take inputs of a given type 
and produce, for example, nanopublications, and they could 
do this in a real-time and automatic manner as new input 
data becomes available. 

2. BACKGROUND 

I borrow the term “bot” from Wikipedia, where bots are 
applied, for example, to revert edits that are the results of 
vandalism [^. A prominent example is a bot that has 
created around 454 000 articles for the Swedish Wikipedia 
[^. The fact that bots can be powerful also in a negative 
sense has become apparent with the rise of botnets |13| , and 
with the increasing problem of ’’social bots” that pretend to 
be humans [^. I argue here that the power of bots could also 
be harnessed in a positive way for scientific computation. In 
contrast to the agents in the original Semantic Web paper 

, such bots would not propose or make decisions as a kind 
of personal assistant, but they would only publish data snip¬ 
pets and they would do that without any further interaction 
with humans. 

In previous work, I showed how the concept of nanopub¬ 
lications can be extended and I mentioned the possible use 
of bots to create them [^. I also presented an approach to 
attach cryptographic hash values to nanopublication iden¬ 
tifiers to make them verifiable and immutable ]11| . Based 
on that work, I have started to establish a nanopublication 
server network, with which nanopublications can be pub¬ 
lished, retrieved, and archived in a reliable, trustworthy, and 
decentralized manner , which could serve as the basis for 
the communication for bots. 

3. APPROACH 

In this position paper, I propose bots as a general con¬ 
cept for scientific computation. For example, a bot could 
apply text mining to extract relations from the abstracts 
of the constantly growing PubMed database, another bot 
could regularly measure the temperature at a given location 
and publish the results, and yet another one could infer new 
facts from existing nanopublications by applying specified 
rules or heuristics (e.g. if disease X is related to gene Y, 
which is targeted by drug Z then Z might help to treat X). 
Importantly, these bots can automatically publish the ob¬ 
tained data without double-checking or direct supervision 
by their creators, and these data can be made immediately 
accessible to everybody (including other bots). 


In a system that treats bots as first-class citizens, we have 
to expect that some bots (and humans for that matter) will 
produce low-quality contributions, and we have to make sure 
that this does not affect the reliability and trustworthiness of 
the system. I argue that we can achieve that without intro¬ 
ducing a central authority, without making concessions with 
respect to the openness of the system, and without delaying 
the publishing of results. We simply need a sufficiently accu¬ 
rate automatic method to discern good contributions from 
bad ones, which can be achieved by a reputation system. 
We can let scientists and bots participate in the same rep¬ 
utation system, where they would increase their reputation 
by receiving positive feedback by other participants on the 
usefulness and quality of their contributions. Positive repu¬ 
tation of a bot, in turn, would give credit and reputation to 
the scientist who created it. 

To arrive at a simple exemplary model to explain the ap¬ 
proach, we can define a relation ”is contributed by”, where 
bots can occur on either side: They are contributions, as 
they were programmed and created by somebody, but they 
are also contributors, as they can create new digital entities 
on their own. We can define a second type of relation to 
represent assessments. For the sake of simplicity, we model 
here only positive assessments and strip them of all granu¬ 
larity and detail, and we can call the resulting relation “gives 
positive assessment for”. 
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Figure 1: A simple example of a graph of contrib¬ 
utors and contributions with edges for creatorship 
and assessments. 

Fig.[T] shows a simple example of such a graph with two 
kinds of edges, representing creatorship and assessments. To 
determine the reputation or importance of the nodes, we can 
in the simplest case treat the two types of edges identically 
and rank the nodes by applying a network measure such as 
Eigenvector centrality (which is closely related to Google’s 
PageRank algorithm to rank websites), as shown by the red 
numbers. The person at the top-left has a high reputation 
because he is endorsed by the person in the middle. The 
latter has a high reputation because her direct and indi¬ 
rect contributions were positively assessed by others (even 


though she has not received a direct assessment herself). 
The third person to the right, however, has not contributed 
anything that was positively assessed by others (only by his 
own bot), and therefore his reputation is low. Of course, 
there are many possible variations and extensions, such as 
bidirectional contribution edges for the Eigenvector calcula¬ 
tion, as indicated by the gray numbers. In general, as one 
cannot influence incoming links from the part of the network 
that is not under one’s control, there is no way to efficiently 
game the system. The scalability of such algorithms in open 
and decentralized systems is demonstrated by their success¬ 
ful application by search engines and peer-to-peer systems 

I]. 

Bots could free scientists from routine tasks and therefore 
allow them to focus on the interesting questions. Further¬ 
more, this approach could increase the value and apprecia¬ 
tion of datasets and software as research products, and give 
due credit to their creators. With appropriate reputation 
mechanisms, this can be achieved in a fully open and decen¬ 
tralized environment. 
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